read and print settings

## settings:
## 
## bastoolsDir :  chr "/home/tutorial/TOOLS/bastools/"
## email :  chr "basegeter@gmail.com"
## filter_dxn :  num 20
## filter_dxn2 :  num 0
## neg.groups :  chr [1:2] "Sample_Plate" "extraction_batch"
## neg.types :  chr [1:2] "PCR_negative" "Extraction_Negative"
## plotting.vars :  chr [1:2] "biomaterial" "bleach_step"
## problemTaxa :  chr "NothingToAdd"
## real :  chr "swab"
## remove.entire.dataset :  logi TRUE
## rep.rm :  chr "biomaterial"
## rep.rm.first :  logi FALSE
## rep.rm.second :  logi FALSE
## rm.only.less.than :  logi TRUE
## samplepc :  num 0.5
## subsetlist : List of 5
##  $ experiment_id : chr "HSJUN19BAS"
##  $ primer_set    : chr "trnL"
##  $ MPLX          : chr "N"
##  $ substudy      : chr "swabtest"
##  $ Replicate_Name: chr "ZYMO"
## sumrepsby :  chr "biomaterial"
## taxa.to.group :  NULL
## taxatabs :  chr "/media/sf_Documents/WORK/G-DRIVE/G-WORK/SHARED_FOLDERS/CRAYFISH/rebin/CRAY-HSJUN19BAS_trnl.none.flash2.vsearch_"| __truncated__
## taxonpc :  num 0
## unwantedTaxa :  chr "NothingToAdd"
## url :  chr "https://docs.google.com/spreadsheets/d/1KZLoXHTgtkD0btSWjyAmFiGJ_cPcYITyfFSlzehisRI/edit#gid=1531090624"
## use.contamination.filter :  logi TRUE
## xLevel :  chr "family"
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

organise mastersheet

## Reading from 'CRAYFISH_FINAL_datasheet'
## Range "'Master_Samplesheet'"
## New names:
## * index -> index...6
## * index -> index...8
## Removing duplicated columns
## Checks only include the mostly used headers, please check to see all desired headers exist
## Subsetting datasheet
## $experiment_id
## [1] "HSJUN19BAS"
## 
## $primer_set
## [1] "trnL"
## 
## $MPLX
## [1] "N"
## 
## $substudy
## [1] "swabtest"
## 
## $Replicate_Name
## [1] "ZYMO"
## , , MPLX = N, substudy = swabtest, Replicate_Name = ZYMO
## 
##              primer_set
## experiment_id trnL
##    HSJUN19BAS   44

import taxatabs

## Loading required package: tidyverse
## ── Attaching packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ tibble  2.1.3     ✓ purrr   0.3.3
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## /media/sf_Documents/WORK/G-DRIVE/G-WORK/SHARED_FOLDERS/CRAYFISH/rebin/CRAY-HSJUN19BAS_trnl.none.flash2.vsearch_qfilt.cutadapt.vsearch_uniq.vsearch_afilt.allsamples_step5.ALL_vsearch_uniq.nodenoise.noclust.rebins.taxatable.tf.spliced.ALTEREDNAMES.txt
## reads: 14590984, taxa: 784, samples: 729
## Read counts all good
## merged taxatable
## reads: 14590984, taxa: 784, samples: 729

remove problem taxa - usually human, predator, NAs and no_hits

## Removing no_hits
## Removing NAs - sequences that had blast hits but were not assigned to any taxon
## Removing problem taxa
## Checking negs
## Ignoring the following taxa: NA;NA;NA;NA;NA;NA;NA & no_hits;no_hits;no_hits;no_hits;no_hits;no_hits;no_hits
## from 2 Extraction_Negative samples 2 contained reads
##  taxon                                                                                    
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;Crithmum;Crithmum maritimum        
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;Cymopterus;Cymopterus sessiliflorus
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;Meum;Meum athamanticum             
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA                              
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA                            
##  Eukaryota;Streptophyta;Magnoliopsida;Fabales;Fabaceae;NA;NA                              
##  Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;Castanea;NA                        
##  Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;Castanopsis;Castanopsis carlesii   
##  Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;NA;NA                              
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Danthoniopsis;Danthoniopsis dinteri  
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Dinebra;NA                           
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Festuca;Festuca kolymensis           
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Muhlenbergia;NA                      
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;NA;NA                                
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Paspalum;Paspalum dilatatum          
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Paspalum;Paspalum notatum            
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;Paspalum;Paspalum thunbergii         
##  Eukaryota;Streptophyta;Magnoliopsida;Ranunculales;Ranunculaceae;NA;NA                    
##  Eukaryota;Streptophyta;Magnoliopsida;unknown;NA;NA;NA                                    
##  Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;Cissus;NA                          
##  Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA                              
##  CRAY-ECsZY-ZYMO-1-trnL-HSBAS CRAY-ECsZY-ZYMO-2-trnL-HSBAS
##     5                           0                         
##     2                           0                         
##     4                           0                         
##  1181                           0                         
##     0                         287                         
##   139                           0                         
##     9                           0                         
##     2                           0                         
##   778                           0                         
##     9                           0                         
##     2                           0                         
##     3                           0                         
##    13                           0                         
##  1118                           0                         
##     2                           0                         
##     7                           0                         
##     9                           0                         
##    18                           0                         
##  4475                           0                         
##     0                          12                         
##     0                         843
## from 2 PCR_negative samples 1 contained reads
##  taxon                                                                                   
##  Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;Brassica;Brassica oleracea
##  Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;Brassica;Brassica souliei 
##  Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA                     
##  Eukaryota;Streptophyta;Magnoliopsida;Rosales;Rosaceae;Rubus;Rubus idaeus                
##  Eukaryota;Streptophyta;Pinopsida;Pinales;Pinaceae;Picea;NA                              
##  Eukaryota;Streptophyta;Pinopsida;Pinales;Pinaceae;Picea;Picea neoveitchii               
##  Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;Athyrium;NA              
##  Eukaryota;Streptophyta;unknown;unknown;NA;NA;NA                                         
##  CRAY-34NC2-ZYMO-2-trnL-HSBAS
##     2                        
##     2                        
##   796                        
##     2                        
##    12                        
##     8                        
##   486                        
##  2333
## Making quantile plots to help deciding dxn_filter threshold
## Using taxon as id variables
## Using taxon as id variables

## Making barplots for only those taxa detected in both negatives and real samples - to help decide taxon filter and rm.contaminants filter
## Using taxon as id variables

## Making barplots for only those taxa detected in both negatives and real samples - to help decide sample filter
## Using taxon as id variables
## Using taxon as id variables

apply taxon and sample filters

## Applying taxon_pc filter.
## Using filter of 0 %. reads removed: 0 from 247738 ; detections removed: 0 from 471
## Applying sample_pc filter. Note: this removes samples with no reads
## Using filter of 0.5 %. reads removed: 1647 from 247738 ; detections removed: 286 from 471
## Checking negs
## Ignoring the following taxa: NA;NA;NA;NA;NA;NA;NA & no_hits;no_hits;no_hits;no_hits;no_hits;no_hits;no_hits
## from 2 Extraction_Negative samples 2 contained reads
##  taxon                                                          
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA    
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA  
##  Eukaryota;Streptophyta;Magnoliopsida;Fabales;Fabaceae;NA;NA    
##  Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;NA;NA    
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;NA;NA      
##  Eukaryota;Streptophyta;Magnoliopsida;unknown;NA;NA;NA          
##  Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;Cissus;NA
##  Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA    
##  CRAY-ECsZY-ZYMO-1-trnL-HSBAS CRAY-ECsZY-ZYMO-2-trnL-HSBAS
##  1181                           0                         
##     0                         287                         
##   139                           0                         
##   778                           0                         
##  1118                           0                         
##  4475                           0                         
##     0                          12                         
##     0                         843
## from 2 PCR_negative samples 1 contained reads
##  taxon                                                                     
##  Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA       
##  Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;Athyrium;NA
##  Eukaryota;Streptophyta;unknown;unknown;NA;NA;NA                           
##  CRAY-34NC2-ZYMO-2-trnL-HSBAS
##   796                        
##   486                        
##  2333
## Making quantile plots to help deciding dxn_filter threshold
## Using taxon as id variables
## Using taxon as id variables

## Making barplots for only those taxa detected in both negatives and real samples - to help decide taxon filter and rm.contaminants filter
## Using taxon as id variables

## Making barplots for only those taxa detected in both negatives and real samples - to help decide sample filter
## Using taxon as id variables
## Using taxon as id variables

remove detection in less than 2 reps (only done here if rep.rm.first is TRUE)

PCR negatives are exempt, if the negative option is set (only makes sense pre-rm.contaminants function, or without rm.contaminants function)

apply detection filter

## Applying detection filter
## Using detection filter of 20 : reads removed: 65 from 246091 ; detections removed: 5 from 185
## Checking negs
## Ignoring the following taxa: NA;NA;NA;NA;NA;NA;NA & no_hits;no_hits;no_hits;no_hits;no_hits;no_hits;no_hits
## from 2 Extraction_Negative samples 2 contained reads
##  taxon                                                        
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA  
##  Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA
##  Eukaryota;Streptophyta;Magnoliopsida;Fabales;Fabaceae;NA;NA  
##  Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;NA;NA  
##  Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;NA;NA    
##  Eukaryota;Streptophyta;Magnoliopsida;unknown;NA;NA;NA        
##  Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA  
##  CRAY-ECsZY-ZYMO-1-trnL-HSBAS CRAY-ECsZY-ZYMO-2-trnL-HSBAS
##  1181                           0                         
##     0                         287                         
##   139                           0                         
##   778                           0                         
##  1118                           0                         
##  4475                           0                         
##     0                         843
## from 2 PCR_negative samples 1 contained reads
##  taxon                                                                     
##  Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA       
##  Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;Athyrium;NA
##  Eukaryota;Streptophyta;unknown;unknown;NA;NA;NA                           
##  CRAY-34NC2-ZYMO-2-trnL-HSBAS
##   796                        
##   486                        
##  2333
## Making quantile plots to help deciding dxn_filter threshold
## Using taxon as id variables
## Using taxon as id variables

## Making barplots for only those taxa detected in both negatives and real samples - to help decide taxon filter and rm.contaminants filter
## Using taxon as id variables

## Making barplots for only those taxa detected in both negatives and real samples - to help decide sample filter
## Using taxon as id variables
## Using taxon as id variables

remove contaminants from relevant groups

## Applying contaminant filter
## Only removing detections which had a read count less than the negtive
## Based on Extraction_Negative CRAY-ECsZY-ZYMO-1-trnL-HSBAS , Removing detections of
## [1] "Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA"
## [2] "Eukaryota;Streptophyta;Magnoliopsida;Fabales;Fabaceae;NA;NA"
## [3] "Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;NA;NA"
## [4] "Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;NA;NA"  
## [5] "Eukaryota;Streptophyta;Magnoliopsida;unknown;NA;NA;NA"
## if it occurred in any samples belonging to extraction_batch:batchSWB_ZY
## 34479 reads removed from 21 detection(s) in 26 sample(s), of which 7691 were in the negative. See table below for details:
##                                 Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA
## CRAY-ECsZY-ZYMO-1-trnL-HSBAS    1181                                                       
## CRAY-SWB1_P10-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P12-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P12-ZYMO-2-trnL-HSBAS 3767                                                       
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS 1563                                                       
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS   0                                                        
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS    0                                                        
## CRAY-SWB1_P21-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P21-ZYMO-2-trnL-HSBAS   0                                                        
## CRAY-SWB1_P4-ZYMO-1-trnL-HSBAS  4011                                                       
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB1_P8-ZYMO-1-trnL-HSBAS  4863                                                       
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS 1280                                                       
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS  762                                                       
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS 1407                                                       
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P2-ZYMO-1-trnL-HSBAS  4391                                                       
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS  1756                                                       
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS   847                                                       
## CRAY-SWB2_P8-ZYMO-1-trnL-HSBAS  705                                                        
##                                 Eukaryota;Streptophyta;Magnoliopsida;Fabales;Fabaceae;NA;NA
## CRAY-ECsZY-ZYMO-1-trnL-HSBAS     139                                                       
## CRAY-SWB1_P10-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P12-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P12-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS 262                                                        
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS  541                                                        
## CRAY-SWB1_P21-ZYMO-1-trnL-HSBAS 2868                                                       
## CRAY-SWB1_P21-ZYMO-2-trnL-HSBAS 835                                                        
## CRAY-SWB1_P4-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB1_P8-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS  685                                                       
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P2-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P8-ZYMO-1-trnL-HSBAS    0                                                        
##                                 Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;NA;NA
## CRAY-ECsZY-ZYMO-1-trnL-HSBAS     778                                                       
## CRAY-SWB1_P10-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P12-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P12-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS   0                                                        
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS    0                                                        
## CRAY-SWB1_P21-ZYMO-1-trnL-HSBAS 1053                                                       
## CRAY-SWB1_P21-ZYMO-2-trnL-HSBAS 236                                                        
## CRAY-SWB1_P4-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB1_P8-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS    0                                                       
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS    0                                                       
## CRAY-SWB2_P2-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS     0                                                       
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS     0                                                       
## CRAY-SWB2_P8-ZYMO-1-trnL-HSBAS    0                                                        
##                                 Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;NA;NA
## CRAY-ECsZY-ZYMO-1-trnL-HSBAS    1118                                                     
## CRAY-SWB1_P10-ZYMO-2-trnL-HSBAS    0                                                     
## CRAY-SWB1_P12-ZYMO-1-trnL-HSBAS 1435                                                     
## CRAY-SWB1_P12-ZYMO-2-trnL-HSBAS    0                                                     
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS    0                                                     
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS  410                                                     
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS 1991                                                     
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS   0                                                      
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS    0                                                      
## CRAY-SWB1_P21-ZYMO-1-trnL-HSBAS    0                                                     
## CRAY-SWB1_P21-ZYMO-2-trnL-HSBAS   0                                                      
## CRAY-SWB1_P4-ZYMO-1-trnL-HSBAS     0                                                     
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS     0                                                     
## CRAY-SWB1_P8-ZYMO-1-trnL-HSBAS     0                                                     
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS     0                                                     
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS 1799                                                     
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS  542                                                     
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS    0                                                     
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS    0                                                     
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS    0                                                     
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS    0                                                     
## CRAY-SWB2_P2-ZYMO-1-trnL-HSBAS     0                                                     
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS   311                                                     
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS  2881                                                     
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS     0                                                     
## CRAY-SWB2_P8-ZYMO-1-trnL-HSBAS  217                                                      
##                                 Eukaryota;Streptophyta;Magnoliopsida;unknown;NA;NA;NA
## CRAY-ECsZY-ZYMO-1-trnL-HSBAS    4475                                                 
## CRAY-SWB1_P10-ZYMO-2-trnL-HSBAS 1289                                                 
## CRAY-SWB1_P12-ZYMO-1-trnL-HSBAS    0                                                 
## CRAY-SWB1_P12-ZYMO-2-trnL-HSBAS 3155                                                 
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS 1869                                                 
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS    0                                                 
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS 2942                                                 
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS   0                                                  
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS  943                                                  
## CRAY-SWB1_P21-ZYMO-1-trnL-HSBAS    0                                                 
## CRAY-SWB1_P21-ZYMO-2-trnL-HSBAS   0                                                  
## CRAY-SWB1_P4-ZYMO-1-trnL-HSBAS     0                                                 
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS  1490                                                 
## CRAY-SWB1_P8-ZYMO-1-trnL-HSBAS  2494                                                 
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS  1353                                                 
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS    0                                                 
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS    0                                                 
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS 2227                                                 
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS    0                                                 
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS 2088                                                 
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS 1171                                                 
## CRAY-SWB2_P2-ZYMO-1-trnL-HSBAS     0                                                 
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS  1005                                                 
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS     0                                                 
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS  2632                                                 
## CRAY-SWB2_P8-ZYMO-1-trnL-HSBAS    0
## Based on Extraction_Negative CRAY-ECsZY-ZYMO-2-trnL-HSBAS , Removing detections of
## [1] "Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA"
## [2] "Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA"
## if it occurred in any samples belonging to extraction_batch:batchSWB_ZY
## 5674 reads removed from 11 detection(s) in 21 sample(s), of which 1130 were in the negative. See table below for details:
##                                 Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA
## CRAY-ECsZY-ZYMO-2-trnL-HSBAS    287                                                          
## CRAY-SWB1_P14-ZYMO-1-trnL-HSBAS   0                                                          
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS   0                                                          
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS    0                                                         
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS    0                                                         
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS   0                                                          
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS    0                                                          
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS    0                                                          
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS     0                                                         
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS   0                                                          
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS   0                                                          
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS   0                                                          
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS    0                                                         
## CRAY-SWB2_P16-ZYMO-2-trnL-HSBAS    0                                                         
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS    0                                                         
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS 639                                                          
## CRAY-SWB2_P21-ZYMO-1-trnL-HSBAS   0                                                          
## CRAY-SWB2_P21-ZYMO-2-trnL-HSBAS   0                                                          
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS    0                                                          
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS     0                                                         
## CRAY-SWB2_P8-ZYMO-2-trnL-HSBAS    0                                                          
##                                 Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA
## CRAY-ECsZY-ZYMO-2-trnL-HSBAS    843                                                        
## CRAY-SWB1_P14-ZYMO-1-trnL-HSBAS 884                                                        
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS 449                                                        
## CRAY-SWB1_P16-ZYMO-2-trnL-HSBAS 1512                                                       
## CRAY-SWB1_P19-ZYMO-1-trnL-HSBAS 1448                                                       
## CRAY-SWB1_P19-ZYMO-2-trnL-HSBAS 796                                                        
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS  214                                                        
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS  609                                                        
## CRAY-SWB1_P8-ZYMO-2-trnL-HSBAS  1017                                                       
## CRAY-SWB2_P10-ZYMO-1-trnL-HSBAS 295                                                        
## CRAY-SWB2_P10-ZYMO-2-trnL-HSBAS 485                                                        
## CRAY-SWB2_P12-ZYMO-1-trnL-HSBAS 805                                                        
## CRAY-SWB2_P12-ZYMO-2-trnL-HSBAS 1004                                                       
## CRAY-SWB2_P16-ZYMO-2-trnL-HSBAS 1414                                                       
## CRAY-SWB2_P19-ZYMO-1-trnL-HSBAS 1638                                                       
## CRAY-SWB2_P19-ZYMO-2-trnL-HSBAS   0                                                        
## CRAY-SWB2_P21-ZYMO-1-trnL-HSBAS 322                                                        
## CRAY-SWB2_P21-ZYMO-2-trnL-HSBAS 302                                                        
## CRAY-SWB2_P4-ZYMO-2-trnL-HSBAS  261                                                        
## CRAY-SWB2_P6-ZYMO-1-trnL-HSBAS  1586                                                       
## CRAY-SWB2_P8-ZYMO-2-trnL-HSBAS  806
## Based on PCR_negative CRAY-34NC2-ZYMO-2-trnL-HSBAS , Removing detections of
## [1] "Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA"       
## [2] "Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;Athyrium;NA"
## [3] "Eukaryota;Streptophyta;unknown;unknown;NA;NA;NA"
## if it occurred in any samples belonging to Sample_Plate:34
## 11425 reads removed from 6 detection(s) in 6 sample(s), of which 3615 were in the negative. See table below for details:
##                                 Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA
## CRAY-34NC2-ZYMO-2-trnL-HSBAS     796                                                               
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS 82                                                                 
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS    0                                                                
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS     0                                                               
## CRAY-SWB2_P21-ZYMO-2-trnL-HSBAS  0                                                                 
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS  1564                                                               
##                                 Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;Athyrium;NA
## CRAY-34NC2-ZYMO-2-trnL-HSBAS     486                                                                      
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS  0                                                                        
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS    0                                                                       
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS  1669                                                                      
## CRAY-SWB2_P21-ZYMO-2-trnL-HSBAS  0                                                                        
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS     0                                                                      
##                                 Eukaryota;Streptophyta;unknown;unknown;NA;NA;NA
## CRAY-34NC2-ZYMO-2-trnL-HSBAS    2333                                           
## CRAY-SWB1_P14-ZYMO-2-trnL-HSBAS  0                                             
## CRAY-SWB1_P2-ZYMO-2-trnL-HSBAS  127                                            
## CRAY-SWB1_P6-ZYMO-2-trnL-HSBAS     0                                           
## CRAY-SWB2_P21-ZYMO-2-trnL-HSBAS 59                                             
## CRAY-SWB2_P6-ZYMO-2-trnL-HSBAS     0
## ***********A total of 51578 reads removed
## ****Furthermore, removing the following contaminant taxa from entire dataset
## [1] "Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA"       
## [2] "Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;Athyrium;NA"
## [3] "Eukaryota;Streptophyta;unknown;unknown;NA;NA;NA"                           
## [4] "Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA"             
## [5] "Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA"
## ***********A further  976 reads removed
## Checking that all negatives are now clean
## Ignoring the following taxa: NA;NA;NA;NA;NA;NA;NA & no_hits;no_hits;no_hits;no_hits;no_hits;no_hits;no_hits
## No negatives found with reads

remove detection in less than 2 reps

PCR negatives are exempt, if the negative option is set (only makes sense pre-rm.contaminants function, or without rm.contaminants function)

sum reps

## If only one rep, will keep that rep

apply 2nd detection filter (possibly makes more sense here?)

## Applying detection filter
## Using detection filter of 0 : reads removed: 0 from 193472 ; detections removed: 0 from 127

aggregate at chosen level and keep only that-level taxa

## Reminder: this changes 'unknown' and 'collapsed' to 'NA'
## Removing NAs and no_hits

remove unwanted taxa for analysis

## Removing unwanted taxa

summary —this is pre-grouping - please inspect, add groupings to config file as necessary and look at post-grouping table. Can more grouping be done, is anything wrong?

## Skipping grouping of taxa

group taxa, where possible

if(is.null(taxa.to.group)){
  message("Skipping grouping of taxa")
} else {
  
  for(i in 1:nrow(taxa.to.group)){
    all.taxatabs.ss<-bas.group.taxa(taxatab = all.taxatabs.ss,taxon=as.character(taxa.to.group[i,1]), jointo=as.character(taxa.to.group[i,2]))
  }
  
  stepcounter<-stepcounter+1
  all.stats[[stepcounter]]<-taxatab.sumStats(all.taxatabs.ss,stepname = "group_taxa")
}
## Skipping grouping of taxa

summary–post-grouping

## Skipping grouping of taxa

some barplots

## Making barplots
## If column names are not ss_sample_ids using 'grouping' to specify what they are
## Using taxon as id variables
## Outputting as a list where first element is plot and second is legend

## If column names are not ss_sample_ids using 'grouping' to specify what they are
## Using taxon as id variables
## Outputting as a list where first element is plot and second is legend

## If column names are not ss_sample_ids using 'grouping' to specify what they are
## Using taxon as id variables
## Outputting as a list where first element is plot and second is legend

## If column names are not ss_sample_ids using 'grouping' to specify what they are
## Using taxon as id variables
## Outputting as a list where first element is plot and second is legend

## If column names are not ss_sample_ids using 'grouping' to specify what they are
## Using taxon as id variables
## Outputting as a list where first element is plot and second is legend

## If column names are not ss_sample_ids using 'grouping' to specify what they are
## Using taxon as id variables
## Outputting as a list where first element is plot and second is legend

pca plot (just an example, not interesting in this case)

## Plotting pca plot with lines
## Assuming grouping has already been done
## Principal Component Analysis plot of community simmilarity using Bray-Curtis distances
## Note: Ellipses will not be calculated if there are groups with too few data points
## ellipses are drawn with a confidence level of 0.90
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Assuming grouping has already been done
## Principal Component Analysis plot of community simmilarity using Bray-Curtis distances
## Note: Ellipses will not be calculated if there are groups with too few data points
## ellipses are drawn with a confidence level of 0.90

## Plotting pca plot without lines
## Assuming grouping has already been done
## Principal Component Analysis plot of community simmilarity using Bray-Curtis distances
## Note: Ellipses will not be calculated if there are groups with too few data points
## ellipses are drawn with a confidence level of 0.90
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Assuming grouping has already been done
## Principal Component Analysis plot of community simmilarity using Bray-Curtis distances
## Note: Ellipses will not be calculated if there are groups with too few data points
## ellipses are drawn with a confidence level of 0.90

combine counts and taxalists

## Note that detections/samples after sumreps will be fewer because reps are joined
##    detections  reads taxa samples                step
## 1         481 264378  208      41               start
## 2         473 247750  207      41           rm.nohits
## 3         471 247738  206      41     rm.non-assigned
## 4         471 247738  206      41          rm.problem
## 5         471 247738  206      41             taxonpc
## 6         185 246091   52      41            samplepc
## 7         180 246026   51      40           dxnfilter
## 8         140 193472   50      37     rm.contaminants
## 9         127 193472   50      19             sumreps
## 10        127 193472   50      19          dxnfilter2
## 11        114 193472   34      19 aggregate_by_xLevel
## 12        111 189664   31      19    keep_only_xLevel
## 13        111 189664   31      19            unwanted
## The taxalist output needs more work, depends on collapsed taxa, need to split nto 2 tables, pre and post collapse.
##         Running pre-aggregate_by_xLevel only. Some clever way to plot this?

final taxa list

##  [1] Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA              
##  [2] Eukaryota;Streptophyta;Magnoliopsida;Apiales;Araliaceae;NA;NA            
##  [3] Eukaryota;Streptophyta;Magnoliopsida;Asparagales;Amaryllidaceae;NA;NA    
##  [4] Eukaryota;Streptophyta;Magnoliopsida;Asterales;Asteraceae;NA;NA          
##  [5] Eukaryota;Streptophyta;Magnoliopsida;Brassicales;Brassicaceae;NA;NA      
##  [6] Eukaryota;Streptophyta;Magnoliopsida;Ericales;Actinidiaceae;NA;NA        
##  [7] Eukaryota;Streptophyta;Magnoliopsida;Ericales;Primulaceae;NA;NA          
##  [8] Eukaryota;Streptophyta;Magnoliopsida;Fabales;Fabaceae;NA;NA              
##  [9] Eukaryota;Streptophyta;Magnoliopsida;Fagales;Betulaceae;NA;NA            
## [10] Eukaryota;Streptophyta;Magnoliopsida;Fagales;Fagaceae;NA;NA              
## [11] Eukaryota;Streptophyta;Magnoliopsida;Fagales;Juglandaceae;NA;NA          
## [12] Eukaryota;Streptophyta;Magnoliopsida;Gentianales;Rubiaceae;NA;NA         
## [13] Eukaryota;Streptophyta;Magnoliopsida;Lamiales;Plantaginaceae;NA;NA       
## [14] Eukaryota;Streptophyta;Magnoliopsida;Laurales;Lauraceae;NA;NA            
## [15] Eukaryota;Streptophyta;Magnoliopsida;Malpighiales;Euphorbiaceae;NA;NA    
## [16] Eukaryota;Streptophyta;Magnoliopsida;Malpighiales;Salicaceae;NA;NA       
## [17] Eukaryota;Streptophyta;Magnoliopsida;Malvales;Malvaceae;NA;NA            
## [18] Eukaryota;Streptophyta;Magnoliopsida;Myrtales;Lythraceae;NA;NA           
## [19] Eukaryota;Streptophyta;Magnoliopsida;Poales;Cyperaceae;NA;NA             
## [20] Eukaryota;Streptophyta;Magnoliopsida;Poales;Poaceae;NA;NA                
## [21] Eukaryota;Streptophyta;Magnoliopsida;Rosales;Moraceae;NA;NA              
## [22] Eukaryota;Streptophyta;Magnoliopsida;Rosales;Rhamnaceae;NA;NA            
## [23] Eukaryota;Streptophyta;Magnoliopsida;Rosales;Rosaceae;NA;NA              
## [24] Eukaryota;Streptophyta;Magnoliopsida;Sapindales;Rutaceae;NA;NA           
## [25] Eukaryota;Streptophyta;Magnoliopsida;Sapindales;Simaroubaceae;NA;NA      
## [26] Eukaryota;Streptophyta;Magnoliopsida;Solanales;Solanaceae;NA;NA          
## [27] Eukaryota;Streptophyta;Magnoliopsida;Vitales;Vitaceae;NA;NA              
## [28] Eukaryota;Streptophyta;Magnoliopsida;Zingiberales;Musaceae;NA;NA         
## [29] Eukaryota;Streptophyta;Pinopsida;Pinales;Pinaceae;NA;NA                  
## [30] Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Athyriaceae;NA;NA     
## [31] Eukaryota;Streptophyta;Polypodiopsida;Polypodiales;Dennstaedtiaceae;NA;NA
## 31 Levels: Eukaryota;Streptophyta;Magnoliopsida;Apiales;Apiaceae;NA;NA ...

plot counts

## Note that detections/samples after sumreps will be fewer because reps are joined

a few final sentences would be good

write final table